Effects of Stop Words Elimination for AIR
نویسنده
چکیده
The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, CorpusBased Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches to reach an optimal performance, and compare their effect on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi retrieval system had the best overall performance of the three weighting algorithms used in the study, stoplists improved retrieval effectiveness especially when used with the BM25 weight. The overall performance of a general stoplist was better than the other two lists.
منابع مشابه
Estimating the Parameters for Linking Unstandardized References with the Matrix Comparator
This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...
متن کاملEffects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study
The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, CorpusBased Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches ...
متن کاملQuery Term Selection Strategies for Web-based Chinese Factoid Question Answering
Passage retrieval plays an important role in a Chinese factoid Question Answering (QA) system. Query term selection is the process of choosing keywords from a given question to make the most use of information retrieval engines. Query terms selected by humans are analyzed to measure the difficulty and for evaluating machine generated results. Three approaches, namely stop words elimination, rul...
متن کاملPhoto catalytic removal of Toluene vapor from air in the Adsorption-Photo catalytic bed
Background and aims: Clean air is one of the most important components of health and sustainable development. Every person breathes about 10 kg of air per day and if it contains pollutants, it will have a serious impact on their health. Indoor air quality (IAQ) is one of the major health issues that have been addressed in recent years with changes in lifestyle patterns. Usually, due to the incr...
متن کاملحذف مونوکسیدکربن به روش پلاسمای سرد
Abstract Background and aims:Nowadays, the non-thermal plasma is considered as a successful new technology with high efficiency in the air pollution control and is in the focal attention of the researchers. Various types of atmospheric pollutants adversely influence on the human health and the environment regionally and globally. Carbon monoxide has been introduced as a critical pollutant wh...
متن کامل